2025-01-16 10:42:26.AIbase.14.8k
Alibaba Qwen Team Releases New Process Reward Model, Advancing Mathematical Reasoning
The Alibaba Qwen team recently published a paper titled 'Lessons Learned from the Development of Process Reward Models in Mathematical Reasoning' and introduced two new models in the Qwen2.5-Math-PRM series, featuring 7B and 72B parameters respectively. These models break through the limitations of the existing PRM framework in mathematical reasoning, significantly improving the accuracy and generalization ability of reasoning models through innovative techniques. Mathematical reasoning has long been a major challenge for large language models (LLMs), especially regarding errors in intermediate reasoning steps.